Systems and Methods for Forecasting Program Viewership

ABSTRACT

Systems and methods for predicting who is watching a program are disclosed. Text related to the program can be reviewed. Pre-determined genre words and pre-determined keywords can be determined. Words from the text which are relevant words can be determined. How closely the relevant words coincide to the pre-determined genre words can be determined by generating a breakdown of how many relevant words are the pre-determined genre words and the pre-determined keywords. It can be predicted who will watch the program based on the breakdown.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation application of U.S. application Ser. No. 16/444,514, filed Jun. 18, 2019, which claims the benefit of U.S. Provisional Patent Application No. 62/686,402, filed Jun. 18, 2018 the disclosures of which are hereby incorporated by reference in their entireties.

BRIEF DESCRIPTION OF THE DRAWINGS

Various objectives, features, and advantages of the disclosed subject matter can be more fully appreciated with reference to the following detailed description of the disclosed subject matter when considered in connection with the following drawings, in which like reference numerals identify like elements.

FIG. 1 is an example DNA analysis of an algorithm calculated for a movie, according to aspects of the disclosure.

FIG. 2 is a graph of wiki views vs. actual ratings for example programs, according to aspects of the disclosure.

FIG. 3 is an example of a co-occurrence matrix, according to aspects of the disclosure.

FIG. 4 illustrates several main buzz components that can be included in an example, according to aspects of the disclosure.

FIG. 5 is an example of messaging retweet information, according to aspects of the disclosure.

FIG. 6 illustrates an example of known data, gap and forecast periods, according to aspects of the disclosure.

FIG. 7 illustrates an example of gaps which can help evaluate certain features, according to aspects of the disclosure.

FIG. 8 illustrates example machine learning components and example types of data used to make forecasts, according to aspects of the disclosure.

FIG. 9 illustrates an example of how an IMBD identifier can be found using a Wikipedia page, according to aspects of the disclosure.

FIG. 10 illustrates an example algorithm that can be used to detects other hashtags that relate to specific content, according to aspects of the disclosure.

FIG. 11 illustrates an example graph of results for reviews that can be tested, according to aspects of the disclosure.

FIG. 12 illustrates a graph showing a linear regression and a second degree polynomial regression that can be calculated for example series, according to aspects of the disclosure.

FIG. 13 illustrates an example of data that can be collected and/or viewed related to Wikipedia page visits, according to aspects of the disclosure.

FIG. 14 illustrates another example of data that can be collected and/or viewed related to Wikipedia page visits, according to aspects of the disclosure.

FIG. 15 illustrates components and data that can be used in some embodiments of a buzz analysis, according to aspects of the disclosure.

FIG. 16 illustrates an example computer, according to aspects of the present disclosure.

The drawings are not necessarily to scale, or inclusive of all elements of a system, emphasis instead generally being placed upon illustrating the concepts, structures, and techniques sought to be protected herein.

DETAILED DESCRIPTION OF ASPECTS OF THE DISCLOSURE

Systems and methods for forecasting program viewership are provided. First, training (e.g., deep machine learning) can be done against enriched historical data which can result in a forecast model for a program (e.g., a movie, a show in a series). Second, forecasting can be done against the enriched future schedule data utilizing the previously trained model.

Many types of data can be analyzed using the systems and methods described herein, comprising: historical ratings data, historical schedule data, or historical content metadata, or any combination thereof.

Historical Ratings Data. In some embodiments, the core data for these processes can be provided by the customer (e.g., a cable station that wishes to sell advertising for certain programs). The historical ratings data at various levels of granularity can be provided to the customer by a rating agency (e.g., Nielsen, GfK, Kantar, Comscore). The rating agency can collect data from multiple sources, comprising in some embodiments a weighted audience panel. The rating agency data can comprise a summary of ratings measurements (e.g., ratings for demographic groups by time unit); and/or accompanying information for the program (e.g., network, time, program name, duration). In addition to any rating agency data, the customer can have their own rating information, or may have competitor rating information. Additional data sources can also be used by the customer, such as set-top-box or digital device viewing data.

Historical Schedule Data. Basic schedule ratings data can be provided by a ratings agency. In addition, the customer's own scheduling data can be used. It can include additional information on the aired content, content airing patterns (e.g., series, repeats), breakdown of the content with break information (e.g., commercial and promotion). Sources such as online publishers of Electronic Program Guides (EPGs) can also be used for schedule data.

Historical Content Metadata. Ratings agencies may not have much information on the aired program beyond the program name and occasionally a genre descriptor. A broadcaster may have metadata in a Broadcast Management System (BMS) and/or a Rights Management System (RMS). These systems can provide input such a a content types, a year of production, a distributor, cast members, box office stats, etc. This type of information may also be obtained by a customer from a third party resource.

In some embodiments, Machine Learning (ML) and Neuro Linguistic Programming (NLP) algorithms can be utilized. In some embodiments, the systems and methods can take into consideration, for example, the following: scheduling time, rating by basic audience demographics, content metadata, content analysis (e.g., which may be referred to herein as a content DNA analysis), audience engagement with content metrics (e.g., which may be referred to herein as BUZZ), competitors viewership impact, series extended trend analysis, or advanced time series analysis, or any combination thereof.

In some embodiments, long data sets (e.g., 2-3 years of historical data) can be used for the forecasting. In other embodiments, shorter data sets (e.g., 4-8 weeks) can be utilized. In other embodiments, long and shorter data sets can be used.

Example Algorithmic Processes

The forecast system can predict future viewership (e.g., sometimes referred to as ratings). In some embodiments, the forecast system can predict viewership: within a specific audience segment, for a specific schedule time, or for a specific playback period (e.g., for a specific scheduled long form content such as event, movie, episode of series, etc.), or any combination thereof. In some embodiments, the forecast system can predict the affinity of an audience segment to a specific schedule time and scheduled content. In other embodiments, a content similarity analysis can be used to better predict viewership success.

In some embodiments, several algorithmic subcomponents can be utilized, including, but not limited to, the following:

-   -   Time-series forecaster algorithm     -   Content finder/matcher algorithm     -   Content-DNA analysis algorithm     -   Content-DNA analysis enricher algorithm     -   Social-Buzz analysis algorithm     -   Social-Buzz analysis enricher algorithm     -   An overarching algorithm (e.g., a deep learning algorithm such         as XGBOOST—see, for example,         https://en.wikipedia.org/wiki/Xgboost, which is incorporated by         reference in its entirety, for background information).

In some embodiments, the overarching algorithm can comprise multiple components (e.g. any combination of subcomponents 1-7 above), and can also utilize additional forecasting impacting features, including, but not limited to: audience demographic and/or psychographic descriptors, analysis of historical viewership of the specific program or similar programs, broadcast date and/or time related attribute (e.g., date, time, day in year, week day, holidays), content duration and/or advertising break utilization, playback period (e.g., live, same day as live, C3 (e.g., viewed with same commercials within three days of broadcast), C7 (e.g., viewed with same commercials within seven days of broadcast), program pattern indicators, or (e.g., live, network original, network premier, repeat), or any combination thereof. The overarching algorithm can comprise XGBOOST, or another deep-learning algorithm, such as a a decision tree and/or Long Short Tem Memory (LSTM). FIG. 8 illustrates various machine learning components, and also various types of data, that can be used in the systems and methods described herein in order to make forecasts.

Time-Series Forecaster Algorithm

Time-series forecasting algorithms can predict the future based on models fit on historical data. The time-series forecaster component can learn the historical viewership of a given series, and/or the related airing of same/similar content, and can predict the viewership of the next airing based on this analysis.

Multiple time-series models can be used in the forecasting system, comprising a Prophet Algorithm and/or linear and/or polynomial regression. The Prophet algorithm can consider linear and/or logistic growth curve trends, and/or any combination of annually, monthly and weekly cyclicality, and/or a user-provided list of important time attributes including exceptions such as holidays. Background information on the Prophet Algorithm can be found at https://research.fb.com/prophet-forecasting-at-scale/, which is incorporated by reference in its entirety.

Content Finder/Matcher Algorithm

Carrying out a data enrichment from web sources can require content identification across multiple web sources. Data available for such content identification can be very limited (e.g., only using content name). The forecasting system can use a content finder and/or matcher algorithm based on multiple web sources that tries to find similar aired content based on a series of attributes being evaluated (e.g., similar channel, similar date & time, similar year of release, similar genres, similar cast, similar director). The output of this process can be matched pointers to the content in multiple web sources that can be used for data enrichment. For example, if a user is interested in the Wonder Woman movie from 2017, the content finder and/or matcher algorithm could search multiple pre-designated web sources for “Wonder Woman”, “2017” etc., and keep adding in attributes until only one program is returned. In other embodiments, more than one program can be returned, or the algorithm can indicate a probability that the returned program is the one the user wants.

Content DNA Analysis Algorithm

The content DNA engine was developed due to a need to represent the content (e.g. what the content is about) in a numerical manner that could be fed into the overarching algorithm. This can be done by generating a fixed sized vector of numbers (e.g., scoring) per contextual topics.

In some embodiments, methods and systems for predicting who is watching a program can comprise: review text related to the program, the text comprising: plot information, sub-title information, summary information, script information, or synopsis information, or any combination thereof; determine pre-determined genre words and pre-determined keywords based on machine learning analysis of historical programs; determine which words from the text are relevant words, the relevant words being words that help identify genre words or keywords; and determine how closely the relevant words coincide to the pre-determined genre words by generating a DNA breakdown of how many relevant words are the pre-determined genre words and the pre-determined keywords; and predict who will watch the program based on the DNA breakdown.

There are many data sets available (e.g., web, services) that can provide attributed topics (e.g., genre) for content. In some embodiments, rather than use a limited topic set and binary values (e.g., content is romance (1)+comedy (1)), but other topics get a (0)), a scoring value (e.g., content is 70% romance and 30% comedy) can be used. In this way, deep learning technology can be used with a large content data set. An artificial neural network (ANN) can thus be used on thousands of programs (e.g., movies and series episodes). For example, in some embodiments a database including information on 85,000 movies and 50,000 series can be utilized. The input layer to the ANN can be text about and/or from all the programs. The text can comprise commonly used short synopsis, plot description, original scripts (when available) and subtitles. In some embodiments, subtitle information can be used before the other types of text, as it can generate more accurate forecasting.

The text can be transformed from word embeddings to a fixed-size vector of length 780, using, for example, using Stanford's GloVe Algorithm. Background information on the GloVe Algorithm can be found at https://nlp.stanford.edu/projects/glove/, which is incorporated by reference in its entirety.

Each element in the vector can represent a “cluster” of words, and the value of the cluster can be the amount of “presence” of this cluster in the text of the content.

Each cluster can represent a general concept and/or subject and/or idea, and each word in the cluster can relate to this idea. For example, two example generated clusters can be:

-   -   {india, goa, Bengal, assam, Ceylon, indian, Andhra, tamil}     -   {scam, embezzlement, obstruction, bribery, payoff, cheating,         evasion, insider, rigging, fixing, forgery, scheme, corruption,         blackmail, multimillion, perjury, theft, misuse, racketeering,         extortion, fraud, graft, collusion, peddling, bribe}

In the second cluster, there are words that are not directly related (e.g., multimillion and bribe), but because the algorithm can detect that those words often come together, they can be clustered together.

The output layer of the ANN can initially be genres. For example, in some embodiments, a selected list of 26 values picked from Internet Movie Database (e.g., IMDB.com), Wikipedia, and an entity's Broadcast Management System (IBMS) can be utilized as genres. In this way, the ANN can actually learn to predict the genres of the program and their significance from the text of the program.

In addition, in some embodiments, the accuracy of the prediction can be increased by further improving the DNA values by adding the ability to score against commonly used keywords associated with content. A large matrix of programs X keywords associated with these programs can be used. For example, 16,000+ programs and thousands of keywords associated with those programs can be used. For each of the 16,000+ movies, a bitmap against all the keywords can be created. A 1 can be designated if the keyword belongs to the movie and 0 can be designated if not. A pre-defined threshold can be applied, and keywords that appear less times than the pre-defined threshold can be deleted.

A co-occurrence matrix (e.g., dim: (number of keywords)×(number of keywords))—i.e. how many times each keyword appeared with each other keyword) can be created. An example of a co-occurrence matrix is FIG. 3 . This matrix can be normalized by a matrix diagonal, which can indicate how many times a keyword appeared with itself, or in other words how many times it appeared in general, so we divided each row by this value.

At this stage, a vector representation for each keyword can be created. The keywords can then be clustered using the vector representations (e.g., using a co-occurrence algorithm in some embodiments). In this way, clusters of similar keywords can be generated, which clusters can be based on the likelihood that those keywords will appear together. The calculated clusters can be reviewed manually and the clusters can be labeled with a name.

For example, the following words could be in a cluster:

-   -   christmas     -   christmas-carol     -   christmas-decorations     -   christmas-eve     -   christmas-gift     -   christmas-lights     -   christmas-party     -   christmas-present     -   christmas-tree

This cluster can be labeled “Christmas”. When keyword clusters such as these (e.g., 35 clusters such as Christmas, birth/children, New York, police) are added to the 26 genres (e.g., drama, comedy, sci-fi), the total length of the DNA representation is, in this example, 26+35=61. In some embodiments, both genres and keyword clusters can be customized. In other embodiments, the number and names of genres can be fixed, but the keyword clusters can be customized. For example, a user could determine to use any number of keyword clusters (e.g., 35 to 135).

Using this methodology, FIG. 1 provides an example DNA analysis of the algorithm calculated for the movie “Shawshank Redemption”. The gold/yellow bars are the “standard” commonly-used genres, and the silver/gray bars are the additional keywords that were added that allowed the programs to be represented in more depth (e.g., in the Shawshank Redemption example, the keyword “prison” was the most significant).

Content DNA Analysis Enricher Algorithm

The DNA enricher role is to get text about the movie and pass it via the Content DNA engine for generation of the topic scoring. Once a content is identified, the enricher is seeking for the subtitles, or other text related to the movie (AKA Fallback process) and generates the DNA.

We frequently use the IMDB ID of the movie as a validator for the correct subtitles (e.g. allocate subtitles files named with the IMDB ID reference).

Social-Buzz Analysis Algorithm

The social buzz component can provides insights, collected from various web data sources, about the “trendiness” of programs. For example, FIGS. 6 and 7 illustrate some example buzz data. To-date data referencing the aired content from sources like Social Media (Twitter, Facebook etc.) and page visits statistics (Wikipedia), can be collected and used and to train a machine deep learning (ML) model (e.g., the XGBOOST model) that can learn, given a “BUZZ pattern”, the predicted viewership of a particular program. As much of this data can exist long before the content goes to air, it can provide a unique source of potential insight on intent to watch this content. There can be a significant correlation between the BUZZ pattern and the ratings. For example, there can be a significant correlation between the wiki views about the show, and the ratings, even when we take just the number of views (e.g., monthly averages) before the actual broadcast.

The deep learning algorithm can be trained to look the number of wiki views as its input, and the ratings as the output. The deep learning algorithm can predict the ratings fairly well, even without any other feature provided, using only the number of wiki exposures to the content. Other types of social media and content related page visits can also be utilized.

FIG. 2 shows a graph of wiki views vs. actual ratings for selected programs on FOX.

Social-Buzz Analysis Enricher Algorithm

Like the DNA, the role of the social buzz enricher can be to collect data from selected websites on the buzz around the content, while validating it against several parameters.

The collected data can be summarized as a buzz “best features” analysis and added as controlled input to the overarching algorithm.

Overarching Algorithm

The overarching algorithm can use the above components, as well as many other components known to those of ordinary skill in the art, to create a prediction of audience viewership. It can includes, in addition to commonly used attributes, a deep analysis of the content context and/or the audience engagement (buzz) with the content, and thus analyze mass amounts of historical data in an advanced manner. The deep learning boosting algorithm (e.g., XGBOOST) can be used with added operational controls.

Example Algorithms and Features

BUZZ Algorithm. FIG. 4 illustrates several main buzz components that can be included in an example embodiment. For example, a buzz analysis can be performed for the program to determine how much social media is generated related to the program, wherein the buzz analysis can comprise: searching pre-determined sources for identifying information, the identifying information comprising: an (Internet Movie DataBase) IMBD identifier, an official hashtag, an official program name, an actor name, or a director name, or any combination thereof, wherein the pre-determined source comprises Wikipedia, Twitter, Facebook, or web pages, or any combination thereof; counting how many times a day the identifying information is mentioned in the pre-determined sources; and providing a buzz rating based on the counting.

FIG. 9 provides an example of how an IMBD identifier can be found using a Wikipedia page. FIGS. 13 and 14 illustrate examples of data that can be collected and/or viewed related to Wikipedia page visits.

FIG. 15 illustrates components and data that can be used in some embodiments of a buzz analysis. The following BUZZ algorithm can be used in some embodiments:

BUZZ=Σ₁ ^(n)(CP+SM+SNT)*RG*TS

where:

-   -   CP—Content Pages—related web page daily number of visits     -   SM—Factored (see below) Social Media daily mentions     -   SNT—Daily average Sentiment Scoring—in some embodiments, can use         host's own machine learning based user commentary sentiment         scoring in some embodiments     -   RG—Regional source relevancy factor—can evaluate how a specific         data source is relevant to the content airing region     -   TS—Time shift factor (see below)

Social Media Count. In some embodiments, the following factored Social Media (SM) mentions (e.g., count) can be determined as follows:

∑(PRT + SMP + SMS * SMSr)∑(PRT + SMP + SMS * SMSr)SM = ? ?indicates text missing or illegible when filed

where:

-   -   PRT—Primary account (@) retweets. Can comprise main account         daily messaging retweets. For example, if given @westworldhbo,         the information in FIG. 5 can be found.     -   SMP—Primary content hashtag (#) mentions. A content specific         hashtag can ensure unique content identification. For example         #empireFOX ensures not any “empire” related mention is checked.         -   Empire Boo Boo Kitty @EmpireBBK·Sh         -   Before You Fall Asleep Everyday, Say Something Positive To             Yourself: #empire         -   #empirefox #inspiration     -   In some cases multiple options could be used as a primary         hashtag. For example, #westworld and #westworldhbo     -   SMS—Secondary content hashtag (#) mentions. For example, a main         cast member, director, or a specific season, such as:         #jonahnolan,     -   SMSr—Secondary content hashtag (#). This can factor in the         likelihood a secondary hashtag mention is contributing to BUZZ.         For example, a season specific hashtag during airing can be set         as extremely relevant (e.g., a 1.0 factor), while an actor         hashtag mentioned on its own can be set as less relevant (e.g.,         0.2 factor).

Time Shift Factor. Some data sources can provide user engagement during original airing of content. In some embodiments, the BUZZ algorithm can use information about a schedule in a different region or about a repeated airing long after the original airing. Examples comprise: a movie airing on TV (e.g., several months after the theatrical release), an originally-aired American network content being licensed to a different country, or a rerun of a series.

The following time shift algorithm can be used:

TS=K*(1/D*(Oa−Ea))

-   -   K—a set constant for each Time Shift reason (e.g., for a movie         gap from a theatrical release, for a regional gap for an origin         country release, or a time gap for an original release in the         same region). The constant can be set by Machine Learning         training on rating relations between the time shifted airing         across the regions and/or sources.     -   D—Time decay factor. The can be how fast the BUZZ impact is         reduced over ta imeline.     -   Oa—Original Airing and/or theater release date.     -   Ea—Evaluated airing date.

For example, FIG. 6 illustrates known data, gap and forecast periods. The BUZZ value can be calculated for historical data. However, for forecasting, the number of features generated can be limited to avoid performance and overfitting forecasting errors (e.g., see https://en.wikipedia.org/wiki/Overfitting, which is incorporated by reference for more information) Utilizing machine learning, specific gaps which evaluate the best pre-determined amount of features (e.g., 10), can be used. FIG. 7 provides an example where 5× daily values from the week before the data gap+the previous 5× weekly averages (to equal a total of 10) are used.

For example:

For data provided for US primetime data forecasting over 2 months data gap the following BUZZ pattern was established as most effective:

Secondary Hashtag Evaluation. As explained above, hashtags can be used to identify topics of social media discussions. Long-form content (e.g., movies and/or series) can be promoted via a common main hashtag (e.g., defined by an originator and/or publisher). Related sub-hashtags can evolve to denote a character, cast, plotline, season, etc. These sub-hashtags can be used on their own on sub-conversations that are still relevant to identify use engagement (e.g., BUZZ) with the evaluated content.

As these sub-hashtags may not be published and can be created by users in a more organic fashion, identifying them correctly can require either manual or algorithmic effort.

In some embodiments, an algorithm can be used that detects, based on pre-defined keywords (e.g., hashtags), other hashtags that relate to specific content one wants to follow. The algorithm can establish (e.g., using machine learning) the co-occurrence of other hashtags with the pre-defined (e.g., main) hashtags, and can decide, based on a pre-defined threshold, the significance of the keyword in relation to the initial words. FIG. 10 illustrates an example of this algorithm when #Empire is the pre-defined word. We can see secondary hashtags for the series finale, for season 2, for the main character (e.g., Cookie), etc. in FIG. 10 .

Sentiment Analyzer. Many types of sentiment analyzers (e.g., tweetfeel or steamcrab for twitter, Newssift for online news) or generic engines (e.g., using revealed context, or IBM Watson) may be used in some embodiments. These solutions may report mainly 3 levels of response (e.g., positive, neutral, negative). They can evaluate vocabulary across all contexts (e.g. they can fit any product or service sentiment evaluation).

In other embodiments, an algorithm can be used that analyzes, based on opinionative text about a movie (e.g., IMDB reviews), what is the sentiment, in a more variable resolution (e.g., ranking from 1 to 10). The algorithm can use NLP to pre-process the text and to make a representation of it that could be fed into ANN. The pre-processing can be similar to the one described on the DNA section, and can be based on Stanford's GloVe algorithm for word embedding, as well as on a K-means algorithm.

In a second stage, the algorithm can train an ANN on IMDB reviews, where the input can be the text representation, and the output can be the user ratings. In this way, the ANN can learn to predict what will be the rating from the text.

In an example training process, we can train on 1,220,000 reviews from 27,000 movies. The results on 120,000 reviews can be tested. FIG. 11 illustrates a graph of these results, with the window size being the difference between prediction and real value. We see that in almost 30% of the time, the results were accurate (e.g., while on a random guess the chances may be only 1/10). If the error tolerance grows to 1 (e.g., if the actual rating was 9 so the prediction could be 8, 9, or 10), the accuracy can grow to 50%, and can keep growing (as expected) as the error tolerance gets higher and higher.

Carry-Over Analysis. A parameters that can affect ratings prediction of a specific timed content, is the carry over effect (e.g., the lead-in analysis), which can evaluate: the impact of viewership of the previous program. This can have to do with many reasons (e.g., viewer habits of TV remote usage, level of engagement with content while watching TV, promo effort at the end of content to the coming next content, live vs. recorded watching habits, etc.). One challenge with the carry over analysis is that long term forecasting can be complicated, and the inaccuracy of the main program and the impact of the rating of the previous program can potentially increasing noise and not clear signal.

To avoid double forecasting (e.g., a rough estimate based on minimal input and accurate using mass input and “carrying-over” the rating from the previous program), we can train the algorithm only once, but feed it with the parameters of the previous program (e.g., whether the previous program is live and/or reply, ratings the previous program got in the past, etc.). When selecting the correct feature set of the previous program, the forecasting accuracy improvement can be improved.

Competitor(s) Analysis. Viewership can be impacted by audience viewing choices put in front of them. While linear viewing habits still exists, and some viewers randomly zap across channels and stop when engaged with currently played content, the shift to planned viewing (e.g., recorded or planned by listing and/or an Electronic Programming Guide (EPG)) is rising. This can lead to competition on the viewers'time, which can mean impact analysis may be required of content played at the same time across all available channels. Thus, in some embodiments, features taken from the main competitors of the predicted channel can be used in the algorithm. This can help to further improve the accuracy of the forecast.

Features such as the content DNA and/or BUZZ for the competitors content, can teach the neural network on the relationship of the competitor's popularity and DNA vs. DNA likelihood of swapping to view competitor's content (e.g., what percentage of specific demo users prefer to watch a popular police and/or drama series on a competitor's channel against the predicted spots event). While historical analysis (e.g., machine learning training) against accurate historical rating can ensure the model built for competition analysis can produce good forecasting results, we can add a degree of noise when referencing the competitors future scheduling, especially on long term forecasting, because these schedules can change last minute without long notification. In some embodiments, we also use a competitor's partial future schedule, as these may be published for a limited period in advance (e.g., a few weeks up to a quarter), while the main schedule that is analysed can be for a year or two in advance.

Series Sloping. Tree-based algorithms, like XGBOOST, may not be very good at predicting values outside of the range they already saw. That means, for example, that if our sequence is {5,4,3,2,1} the next predictions could be closer to {1,1,1, . . . } than to {0,−1,−2, . . . }. In other words: XGBOOST can be very good at interpolating but less good at extrapolating.

In order to give the algorithm the capability to predict “outside of the box”, we can include series sloping regressions, that can be linear and/or polynomial, in order to also take the long term trend into account. The graph in FIG. 12 shows the linear regression and the second degree polynomial regression that can be calculated for two example series (e.g., AMERICA'S GOT TALENT & AMERICAN NINJA WAR).

Example Computer Implementation

Methods described herein may represent processing that occurs within a system. The subject matter described herein can be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structural means disclosed in this specification and structural equivalents thereof, or in combinations of them. The subject matter described herein can be implemented as one or more computer program products, such as one or more computer programs tangibly embodied in an information carrier (e.g., in a machine readable storage device), or embodied in a propagated signal, for execution by, or to control the operation of, data processing apparatus (e.g., a programmable processor, a computer, or multiple computers). A computer program (also known as a program, software, software application, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program does not necessarily correspond to a file. A program can be stored in a portion of a file that holds other programs or data, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers at one site or distributed across multiple sites and interconnected by a communication network.

The processes and logic flows described in this specification, including the method steps of the subject matter described herein, can be performed by one or more programmable processors (e.g., processor 1610 in FIG. 16 ) executing one or more computer programs to perform functions of the subject matter described herein by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus of the subject matter described herein can be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit).

FIG. 16 illustrates an example computer 1605, according to some embodiments of the present disclosure. Computer 1605 can include a processor 1610 suitable for the execution of a computer program, and can include, by way of example, both general and special purpose microprocessors, and any one or more processor of any kind of digital computer. A processor can receive instructions and data from a memory 1630 (e.g., a read only memory or a random access memory or both). Processor 1610 can execute instructions and the memory 1630 can store instructions and data. A computer can include, or be operatively coupled to receive data from or transfer data to, or both, a storage medium 1640 for storing data (e.g., magnetic, magneto optical disks, or optical disks). Information carriers suitable for embodying computer program instructions and data can include all forms of nonvolatile memory, including by way of example semiconductor memory devices, such as EPROM, EEPROM, flash memory device, or magnetic disks. The processor 1610 and the memory 1630 can be supplemented by, or incorporated in, special purpose logic circuitry.

The computer 605 can also include an input/output 1620, a display 1650, and a communications interface 1660.

CONCLUSION

It is to be understood that the disclosed subject matter is not limited in its application to the details of construction and to the arrangements of the components set forth in the following description or illustrated in the drawings. The disclosed subject matter is capable of other embodiments and of being practiced and carried out in various ways. Accordingly, other implementations are within the scope of the following claims. Also, it is to be understood that the phraseology and terminology employed herein are for the purpose of description and should not be regarded as limiting. As such, those skilled in the art will appreciate that the conception, upon which this disclosure is based, may readily be utilized as a basis for the designing of other structures, methods, and systems for carrying out the several purposes of the disclosed subject matter. It is important, therefore, that the claims be regarded as including such equivalent constructions insofar as they do not depart from the spirit and scope of the disclosed subject matter.

While the disclosure has been described with reference to numerous specific details, one of ordinary skill in the art will recognize that the disclosure can be embodied in other specific forms without departing from the spirit of the disclosure. In addition, a number of the figures illustrate processes. The specific operations of these processes may not be performed in the exact order shown, described, and/or claimed. The specific operations may not be performed in one continuous series of operations, and different specific operations may be performed in different embodiments. Thus, one of ordinary skill in the art would understand that the invention is not to be limited by the foregoing illustrative details, but rather is to be defined by the appended claims.

While various embodiments have been described above, it should be understood that they have been presented by way of example and not limitation. It will be apparent to persons skilled in the relevant art(s) that various changes in form and detail may be made therein without departing from the spirit and scope. In fact, after reading the above description, it will be apparent to one skilled in the relevant art(s) how to implement alternative embodiments. Thus, the present embodiments should not be limited by any of the above-described embodiments.

In addition, it should be understood that any figures which highlight the functionality and advantages are presented for example purposes only. The disclosed methodology and system are each sufficiently flexible and configurable such that they may be utilized in ways other than that shown.

Further, the purpose of any Abstract of the Disclosure is to enable the U.S. Patent and Trademark Office and the public generally, and especially the scientists, engineers and practitioners in the art who are not familiar with patent or legal terms or phraseology, to determine quickly from a cursory inspection the nature and essence of the technical disclosure of the application. An Abstract of the Disclosure is not intended to be limiting as to the scope of the present invention in any way.

Although the term “at least one” may often be used in the specification, claims and drawings, the terms “a”, “an”, “the”, “said”, etc. also signify “at least one” or “the at least one” in the specification, claims and drawings.

Additionally, the terms “including”, “comprising” or similar terms in the specification, claims and drawings should be interpreted as meaning “including, but not limited to.”

Finally, it is the applicant's intent that only claims that include the express language “means for” or “step for” be interpreted under 35 U.S.C. 212, paragraph 6. Claims that do not expressly include the phrase “means for” or “step for” are not to be interpreted under 35 U.S.C. 212, paragraph 6. 

1. A method for predicting who is watching a program, the method comprising: review text related to the program, the text comprising: plot information, sub-title information, summary information, script information, or synopsis information, or any combination thereof; determine pre-determined genre words and pre-determined keywords based on machine learning analysis of historical programs; determine which words from the text are relevant words, the relevant words being words that help identify genre words or keywords; and determine how closely the relevant words coincide to the pre-determined genre words by generating a DNA breakdown of how many relevant words are the pre-determined genre words and the pre-determined keywords; and predict who will watch the program based on the DNA breakdown. 